84 research outputs found

    Association Rules Mining Based Clinical Observations

    Full text link
    Healthcare institutes enrich the repository of patients' disease related information in an increasing manner which could have been more useful by carrying out relational analysis. Data mining algorithms are proven to be quite useful in exploring useful correlations from larger data repositories. In this paper we have implemented Association Rules mining based a novel idea for finding co-occurrences of diseases carried by a patient using the healthcare repository. We have developed a system-prototype for Clinical State Correlation Prediction (CSCP) which extracts data from patients' healthcare database, transforms the OLTP data into a Data Warehouse by generating association rules. The CSCP system helps reveal relations among the diseases. The CSCP system predicts the correlation(s) among primary disease (the disease for which the patient visits the doctor) and secondary disease/s (which is/are other associated disease/s carried by the same patient having the primary disease).Comment: 5 pages, MEDINFO 2010, C. Safran et al. (Eds.), IOS Pres

    DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

    Get PDF
    Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0

    PCaAnalyser: A 2D-Image Analysis Based Module for Effective Determination of Prostate Cancer Progression in 3D Culture

    Get PDF
    Three-dimensional (3D) in vitro cell based assays for Prostate Cancer (PCa) research are rapidly becoming the preferred alternative to that of conventional 2D monolayer cultures. 3D assays more precisely mimic the microenvironment found in vivo, and thus are ideally suited to evaluate compounds and their suitability for progression in the drug discovery pipeline. To achieve the desired high throughput needed for most screening programs, automated quantification of 3D cultures is required. Towards this end, this paper reports on the development of a prototype analysis module for an automated high-content-analysis (HCA) system, which allows for accurate and fast investigation of in vitro 3D cell culture models for PCa. The Java based program, which we have named PCaAnalyser, uses novel algorithms that allow accurate and rapid quantitation of protein expression in 3D cell culture. As currently configured, the PCaAnalyser can quantify a range of biological parameters including: nuclei-count, nuclei-spheroid membership prediction, various function based classification of peripheral and non-peripheral areas to measure expression of biomarkers and protein constituents known to be associated with PCa progression, as well as defining segregate cellular-objects effectively for a range of signal-to-noise ratios. In addition, PCaAnalyser architecture is highly flexible, operating as a single independent analysis, as well as in batch mode; essential for High-Throughput-Screening (HTS). Utilising the PCaAnalyser, accurate and rapid analysis in an automated high throughput manner is provided, and reproducible analysis of the distribution and intensity of well-established markers associated with PCa progression in a range of metastatic PCa cell-lines (DU145 and PC3) in a 3D model demonstrated

    Critical assessment of protein intrinsic disorder prediction

    Get PDF
    Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

    Genetic algorithm for Ab initio protein structure prediction based on low resolution models

    No full text
    Protein is a sequence of amino acids bounded into a linear chain that adopts a specific folded three-dimensional (3D) shape. This specific folded shape enables protein to perform specific tasks. Amongst various available computational methods, the protein structure prediction by the ab initio approach is promising and can help to unravel the relationship between sequence and its associated structure. This thesis is focused on the ab initio protein structure prediction (PSP), by developing novel Genetic Algorithm (GA) for an efficient and effective conformation search of low resolution models derived from the two-bead hydrophobichydrophilic (HP) models. The thesis also proposes a novel low resolution model, called hHPNX model providing more accurate predictions compared to the existing low resolution HP models. As a search technique, GA shows promise in the complex search landscape for investigating the PSP problem. However, for longer sequences the performance of GA can deteriorate and cause the algorithm to frequently stall or become stuck in local minima. Therefore, in this thesis, a critical analysis of the working principle of GA (i.e., the schemata theorem) is presented. This analysis leads to the generalisation of the schemata theorem. The fallacies in the selection procedure of the schemata theorem are removed and its crossover operation has been fully defined. A novel concept, a chromosome correlation factor (CCF), is proposed to identify similar chromosomes within the GA population, and the optimal value of CCF enables GA to perform effectively and thus helps provide superior results. Further, a non-isomorphic encoding algorithm is proposed for a bijective encoding within GA that prevents the expansion of the search landscape by maintaining a 1:1 relationship between the genotype and the phenotype. The non-isomorphic encoding reduces the chances of GA stalling and also prevents the tendency of the normal stochastic GA search to behave like a random search. Since the PSP solutions are compact in nature, the simple GA developed without any heuristics is further improved as hybrid GA (HGA) by utilising domain-specific knowledge. For an optimal core cavity, we have defined likely sub-conformations to provide guided search. Further, the multi-objective formulation of the search problem can overcome possible stall or stuck conditions by backtracking effectively and performing efficiently. Novel and effective move operators are designed and applied to efficiently move part of the converging compact conformation and thus achieve overall superior results. The simplified HP model and its extension, the HPNX model, are effective in exploring the convoluted PSP search landscape quickly. With its simplicity maintained, the HPNX is extended to a novel model called hHPNX model, which reduces the amount of degeneracy and which additionally captures the characteristics oftwo distinguished amino acids (Alanine and Valine) from the hydrophobic group. A corrected interaction potential matrix for an existing YhHX model is proposed, leading to its correct representation. Further, the facecentred- cube (FCC) model is shown to have the optimal lattice configuration for closely mapping the real folded protein. Three novel techniques are developed to compute the fitness function efficiently, to reduce the computation time. Most importantly, improvement in the speed of computation is achieved without sacrificing the accuracy of the prediction. All the techniques are complementary to each other and can work concurrently thereby reducing the computation time significantly

    Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification.

    No full text
    A set of features computed from the primary amino acid sequence of proteins, is crucial in the process of inducing a machine learning model that is capable of accurately predicting three-dimensional protein structures. Solutions for existing protein structure prediction problems are in need of features that can capture the complexity of molecular level interactions. With a view to this, we propose a novel approach to estimate position specific estimated energy (PSEE) of a residue using contact energy and predicted relative solvent accessibility (RSA). Furthermore, we demonstrate PSEE can be reasonably estimated based on sequence information alone. PSEE is useful in identifying the structured as well as unstructured or, intrinsically disordered region of a protein by computing favorable and unfavorable energy respectively, characterized by appropriate threshold. The most intriguing finding, verified empirically, is the indication that the PSEE feature can effectively classify disorder versus ordered residues and can segregate different secondary structure type residues by computing the constituent energies. PSEE values for each amino acid strongly correlate with the hydrophobicity value of the corresponding amino acid. Further, PSEE can be used to detect the existence of critical binding regions that essentially undergo disorder-to-order transitions to perform crucial biological functions. Towards an application of disorder prediction using the PSEE feature, we have rigorously tested and found that a support vector machine model informed by a set of features including PSEE consistently outperforms a model with an identical set of features with PSEE removed. In addition, the new disorder predictor, DisPredict2, shows competitive performance in predicting protein disorder when compared with six existing disordered protein predictors

    Genetic algorithm for Ab initio protein structure prediction based on low resolution models

    No full text
    Protein is a sequence of amino acids bounded into a linear chain that adopts a specific folded three-dimensional (3D) shape. This specific folded shape enables protein to perform specific tasks. Amongst various available computational methods, the protein structure prediction by the ab initio approach is promising and can help to unravel the relationship between sequence and its associated structure. This thesis is focused on the ab initio protein structure prediction (PSP), by developing novel Genetic Algorithm (GA) for an efficient and effective conformation search of low resolution models derived from the two-bead hydrophobichydrophilic (HP) models. The thesis also proposes a novel low resolution model, called hHPNX model providing more accurate predictions compared to the existing low resolution HP models. As a search technique, GA shows promise in the complex search landscape for investigating the PSP problem. However, for longer sequences the performance of GA can deteriorate and cause the algorithm to frequently stall or become stuck in local minima. Therefore, in this thesis, a critical analysis of the working principle of GA (i.e., the schemata theorem) is presented. This analysis leads to the generalisation of the schemata theorem. The fallacies in the selection procedure of the schemata theorem are removed and its crossover operation has been fully defined. A novel concept, a chromosome correlation factor (CCF), is proposed to identify similar chromosomes within the GA population, and the optimal value of CCF enables GA to perform effectively and thus helps provide superior results. Further, a non-isomorphic encoding algorithm is proposed for a bijective encoding within GA that prevents the expansion of the search landscape by maintaining a 1:1 relationship between the genotype and the phenotype. The non-isomorphic encoding reduces the chances of GA stalling and also prevents the tendency of the normal stochastic GA search to behave like a random search. Since the PSP solutions are compact in nature, the simple GA developed without any heuristics is further improved as hybrid GA (HGA) by utilising domain-specific knowledge. For an optimal core cavity, we have defined likely sub-conformations to provide guided search. Further, the multi-objective formulation of the search problem can overcome possible stall or stuck conditions by backtracking effectively and performing efficiently. Novel and effective move operators are designed and applied to efficiently move part of the converging compact conformation and thus achieve overall superior results. The simplified HP model and its extension, the HPNX model, are effective in exploring the convoluted PSP search landscape quickly. With its simplicity maintained, the HPNX is extended to a novel model called hHPNX model, which reduces the amount of degeneracy and which additionally captures the characteristics oftwo distinguished amino acids (Alanine and Valine) from the hydrophobic group. A corrected interaction potential matrix for an existing YhHX model is proposed, leading to its correct representation. Further, the facecentred- cube (FCC) model is shown to have the optimal lattice configuration for closely mapping the real folded protein. Three novel techniques are developed to compute the fitness function efficiently, to reduce the computation time. Most importantly, improvement in the speed of computation is achieved without sacrificing the accuracy of the prediction. All the techniques are complementary to each other and can work concurrently thereby reducing the computation time significantly

    Performance of ordered and disordered residue classification based on per residue PSEE value calculated using different contact radius (CR) values.

    No full text
    <p>Classification performance is shown in terms of (A) ACC (<i>blue bar</i>), (B) PPV (<i>purple bar</i>) and (C) MCC (<i>green bar</i>) for CR values equal to 4 to 30. The x-axis and y-axis show the CR values and the performance metric values, respectively.</p
    corecore